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Abstract 

Recovering  a  low-rank  matrix  from  some  of  its  linear  measurements  is  a  popular  problem  in  many  areas  of 
science  and  engineering.  One  special  case  of  it  is  the  matrix  completion  problem,  where  we  need  to  reconstruct 
a  low-rank  matrix  from  incomplete  samples  of  its  entries.  A  lot  of  efficient  algorithms  have  been  proposed  to 
solve  this  problem  and  they  perform  well  when  Gaussian  noise  with  a  small  variance  is  added  to  the  given 
data.  But  they  can  not  deal  with  the  sparse  random-valued  noise  in  the  measurements.  In  this  paper,  we 
propose  a  robust  method  for  recovering  the  low-rank  matrix  with  adaptive  outlier  pursuit  when  part  of  the 
measurements  are  damaged  by  outliers.  This  method  will  detect  the  positions  where  the  data  is  completely 
ruined  and  recover  the  matrix  using  correct  measurements.  Numerical  experiments  show  the  accuracy  of  noise 
detection  and  high  performance  of  matrix  completion  for  our  algorithms  compared  with  other  algorithms. 


1  Introduction 


Nowadays,  a  lot  of  real  world  models  can  be  categorized  as  matrix  completion  (MC)  problems,  such  as  video 
denoising  [13],  data  mining  and  pattern  recognitions  [9],  model  reduction  [10],  low-dimensional  embedding  [1  ] 
etc.  The  general  form  of  the  MC  problem  is: 


minimize 

,\'eRmM 


rank(A"),  s.t.  Xi}j  =  Mij  V(i,j)  €  fl, 


(1.1) 


where  we  are  given  some  entries  of  a  matrix  X  (the  set  Cl)  and  we  want  to  recover  it  with  its  rank  as  low  as 
possible.  rank(X)  is  defined  as  the  number  of  singular  values  of  X.  However,  solving  (1.1)  is  often  numerically 
expensive.  Hence  people  tend  to  consider  its  relaxation: 


minimize  IIXIL,  s.t.  Xi  a 

xeR"x» 


v(i,j)  e  ci. 


(1.2) 


Here  ||X||*  stands  for  the  nuclear  norm  of  X,  which  is  the  L,  norm  of  the  singular  values  cr.j(X),  i.e.  ||Aj|*  = 
E;=i  0j(X)  where  r  =  rank(X).  It  has  been  shown  in  [6,  5,  20]  that,  under  certain  reasonable  conditions, 
(1.2)  and  (1.1)  share  the  same  solution.  [20]  also  did  further  study  about  the  recovery  for  general  linear  operator 

A  :  Rmx"  -A  RC 


minimize  IIXIL,  s.t.  A(X)  =  y.  (1.3) 

A'eR”X“ 

Different  types  of  algorithms  have  been  proposed  to  solve  (1.2),  such  as  linearized  Bregman  method  [  ], 
fixed  point  and  Bregman  iterative  methods  [  8]  and  accelerated  proximal  gradient  algorithm  [2  ] .  [2  ]  solves  an 
unconstrained  version  of  (1.2): 


minimize  £t||X||*  +  ^\\Vn(X)  -  Vn{M)\\2F.  (1.4) 

x e  Rm  x  n  2 

Here  y  is  a  properly  tuned  parameter,  Vq  stands  for  the  projection  onto  the  subspace  of  matrices  with  nonzeros 
restricted  to  the  index  subset  Cl,  and  ||  •  ||_f,  the  Frobenius  norm,  is  defined  as 


Mr  = 


\ 


EE  i  a 


.12 


*= 1  3= 1 


(1.5) 
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for  any  matrix  A  =  (Aj  j)rnxn.  Most  of  the  existing  MC  algorithms  require  singular  value  decomposition  (SVD) 
in  each  iteration,  which  is  the  main  time  cost  in  these  algorithms.  In  order  to  get  rid  of  SVD  and  accelerate  the 
algorithm,  the  authors  in  [24]  proposed  a  new  method  LMaFit  (low-rank  matrix  fitting)  which  solves  a  slightly 
modified  version  of  (1.2): 


minimize  ||{7W—  Z\\2F  s.t.  Zij  =  Mi j,  V(i,j)  £  12.  (1.6) 

u,w,  z 

Here  U  £  Rmxfc,  W  £  Rkxn,  and  Z  £  Rmxn,  where  k  is  a  predicted  rank  bound.  With  an  appropriate  k,  it 
could  give  us  the  same  result  as  (1.2).  U,  W,  Z  can  be  updated  in  an  alternating  fashion.  Following  the  idea  of 
nonlinear  successive  over  relaxation  (SOR)  technique,  [24]  used  weighted  average  between  this  update  and  the 
previous  iterate  and  achieved  a  faster  convergence.  Recently,  people  have  optimized  this  model  and  derived  other 
efficient  algorithms,  such  as  RTRMC  (Riemannian  trust-region  for  matrix  completion)  [  ].  Other  approaches 
include  [16,  12].  We  refer  the  readers  to  these  references  for  more  details. 

In  practice,  there  will  always  be  noise  in  the  measurements  during  acquisition,  therefore,  a  robust  method 
for  solving  MC  problem  is  strongly  needed.  Almost  all  the  existing  algorithms  can  deal  with  additive  Gaussian 
noise  with  a  relatively  small  variance,  but  they  can  not  perform  stably  when  the  given  data  is  corrupted  by 
outliers  [14],  another  type  of  noise  which  often  appears  in  application.  For  example,  the  problem  of  anticipating 
people’s  preference  is  gaining  more  and  more  attention  nowadays.  We  are  often  asked  to  rate  various  kinds  of 
products,  such  as  movies,  books,  games,  or  even  jokes.  This  problem  is  to  use  incomplete  rankings  provided 
by  users  on  some  of  the  products  to  predict  their  preference  on  other  unrated  products.  It  is  typically  treated 
as  a  low-rank  MC  problem.  However,  as  the  data  collection  process  often  lacks  control  and  sometimes  a  few 
people  may  not  be  willing  to  provide  their  true  opinions,  the  acquired  data  may  contain  some  outliers.  Therefore, 
applying  the  regular  MC  algorithm  on  this  corrupted  data  may  not  lead  to  satisfactory  result. 

In  order  to  deal  with  this  case,  we  propose  a  method  using  adaptive  outlier  pursuit  (AOP)  which  adaptively 
detects  the  damaged  data  location  with  high  accuracy.  Without  the  effect  of  wrong  measurements,  the  recon¬ 
struction  performance  can  be  improved  a  lot.  This  AOP  technique  has  been  applied  to  image  denoising  in  [25] 
and  robust  1-bit  compressive  sensing  in  [26]  and  performed  remarkably  well.  Combining  this  technique  with  the 
existing  MC  algorithm,  our  method  is  able  to  reconstruct  the  exact  matrix  even  from  sparsely  corrupted  entries. 

This  paper  is  organized  as  follows.  We  will  describe  our  algorithm  together  with  other  popular  methods  for 
robust  matrix  completion  in  section  2.  Section  3  focuses  on  the  connection  between  our  problem  and  another 
robust  low-rank  matrix  approximation  model.  We  also  provide  extensive  study  in  section  4  on  the  case  when 
we  only  have  limited  information  about  the  noise.  The  performance  of  the  algorithms  is  shown  in  section  5.  We 
will  end  this  work  by  a  short  conclusion. 


2  Algorithm  description 

From  now  on,  let  us  assume  that  the  rank  r  is  given  in  advance,  i.e.  the  rank  estimate  k  is  set  to  be  r.  According 
to  massive  experiments,  the  model  (1.6)  proves  to  be  a  quite  efficient  way  to  deal  with  MC  problems  when  some 
information  about  the  rank  is  known  in  advance.  One  drawback  about  this  formulation  is  that  the  solution 
( U,W )  is  not  unique.  As  a  matter  of  fact,  for  any  r  x  r  invertible  matrix  A,  (UA,  A~1W)  is  another  pair  of 
solution.  Many  people  have  devoted  to  improve  this  model,  such  as  [7,  8,  15,  16,  19,  2,  22]. 

The  author  in  [  1]  combined  the  ideas  in  these  work  and  proposed  the  following  model  and  the  associated 
algorithm  RTRMC: 


minimize  -  Cf ,  {{UW)i  ,■  —  Mt  ?-)2 

weeM,ffeR«»  2  ^  J  '3  l’JJ 

(jj)en 


y  E  (Ctr)b- 


(2.1) 


Here  r  is  the  given  rank,  U  £  Rmxr  is  any  matrix  such  that  its  column  space  U  belongs  to  the  Grassmann 
manifold  Q(m,r).  The  confidence  index  Cij  >  0  is  introduced  for  each  observation,  and  A  is  a  weighted 
parameter.  A  Riemannian  trust-region  method,  GenRTR  [1]  was  used  to  solve  the  above  optimization  problem 
on  the  Grassmannian.  According  to  the  numerical  experiments,  RTRMC  outperforms  other  state-of-the-art 
algorithms  on  a  wide  range  of  problem  instances.  It  is  especially  efficient  for  rectangular  matrices  and  achieves 
a  much  smaller  relative  error. 

However,  its  performance  will  be  ruined  when  sparse  random- valued  noise  is  introduced  to  the  measurements. 
In  order  to  obtain  better  result,  adaptively  finding  the  error  locations  and  reconstructing  the  matrix  can  be 
combined  together  as  in  [25,  26].  Here  we  will  plant  this  idea  into  the  existing  model. 
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We  define  K  as  the  number  of  error  terms  in  the  given  data  and  derive  the  following  revised  model: 

w 

2 


minimise  ^  J2  Ci,jKi((uw)i,i  ~  Mi,j?  +  y  E  ( UW)h 

(*d)en 


s.t.  ^2  (!  “  Ai,f)  <  A ij  £  {0, 1}. 

(ij)en 


(2.2) 

(2.3) 


Here  A  £  Rmx™  is  a  binary  matrix  denoting  the  “correct”  data: 


f  1,  if  (i,j)  £  H,  Mjj  is  “correct”, 
(  0,  otherwise. 


(2.4) 


and  fl  is  a  subspace  of  C l  such  that  A ^  j  =  1  for  all  the  indices  in  f 1.  In  this  work,  we  only  consider  the  case  with 
sparsely  corrupted  measurements,  and  the  other  measurements  are  assumed  to  be  correct.  The  parameter  A  is 
set  to  be  10-8,  i.e. ,  the  term  j)0i^JW)‘j^  can  be  neglected.  All  the  entries  of  the  confidence  matrix  C  are 
chosen  to  be  1.  Hence  the  model  can  be  simplified  into: 

minimize  E  s.t.  v  (1  -  A id)  <  K,  A itj  £  {0, 1}.  (2.5) 

(i,j)en  (i,j)  en 


In  order  to  solve  this  non-convex  problem,  we  use  alternating  minimization  method,  which  splits  the  energy 
minimization  over  A  and  U ,  W  into  two  steps: 

•  Fix  A  and  update  U,  W.  We  need  to  solve  the  following  sub-problem: 


m*i£VVZe  ((UW)ij  —  Mij)2. 

(i,j)eo 


This  can  be  solved  with  RTRMC. 


(2.6) 


•  Fix  U,  W  and  update  A.  This  time  we  are  solving: 

minimize  ^  Aitj((UW)i,j  -  Mitj)2,  s.t.  E  (1  -  &i,j)  <  K,  Aitj  £  {0, 1}.  (2.7) 

This  problem  is  to  choose  |fl|  —  K  elements  with  least  sum  from  {({UW)ij  —  ( i,j )  £  H}.  Here  |fi| 

stands  for  the  number  of  elements  in  set  f 2.  Dehning  r  as  the  Kth  largest  term  in  that  set,  A  can  then  be 
calculated  by 


f  1,  if  (i,j)  £  Cl,  (( UW)itj  -  Mitj)2  <  t, 
\  0,  otherwise. 


(2.8) 


If  the  Kth  and  ( K  +  l)th  larges  terms  are  equal,  then  we  can  choose  any  A  such  that  X](t!))eo(l  ~  M.j)  =  K 
and 


min J{UW)itj  -  Mid)2  >  max_((C/IF)i,j  -  Mid)2.  (2.9) 

(i,j)  en 


In  each  iteration,  we  use  A  to  identify  the  location  of  outliers  based  on  the  newly  constructed  U  and  W.  This 
outlier  detection  technique,  defined  as  adaptive  outlier  pursuit  (AOP),  was  firstly  used  in  [25,  26].  Our  algorithm 
is  as  follows: 


Algorithm  1  RTRMC  with  AOP 
Input:  Cl,  Vn(M),  Miter  >  0,  r  >  0,  K  >  0,  C,  A. 

Initialization:  k  =  0,  A Lj  =  1  for  (i,j)  £  Cl  and  0  otherwise,  Cl  =  Cl. 
while  k  <  Miter  do 

Replace  Cl  in  (2.1)  with  Cl  and  update  Uk  and  Wk  with  RTRMC. 
Update  Afc  with  (2.8). 

Update  Cl  to  be  the  indices  in  Cl  where  A*  ■  =  1. 

If  this  new  Cl  is  the  same  as  the  old  Cl,  break. 
k  =  k  +  1. 

end  while 
return  UkWk . 
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This  algorithm,  together  with  other  two  methods,  SpaRCS  (sparse  and  low-rank  decomposition  via  com¬ 
pressive  sensing)  [23]  and  GRASTA  (Grassmannian  robust  adaptive  subspace  tracking  algorithm)  [11],  will  be 
studied  and  compared  with  extensive  numerical  experiments.  SpaRCS  is  a  recently  proposed  algorithm  which 
aims  at  recovering  low  rank  and  sparse  matrices  from  compressive  measurements  with  the  following  model: 

minimize  ||y  —  A{L  +  S)||2,  s.t.  rank(L)  <  r,  ||S'||o  <  K.  (2-10) 

Here  ||S'||o  stands  for  the  number  of  nonzero  entries  of  the  matrix  S.  In  our  test  this  linear  transformation 
A(L  +  S )  is  defined  as  the  vector  formed  by  the  entries  of  ( L  +  S)  in  Cl.  This  model  can  be  applied  to  solve  MC 
problem  with  sparsely  corrupted  entries.  We  can  form  the  vector  y  with  given  noisy  data.  S  can  be  treated  as 
the  matrix  recording  outliers,  and  L  is  the  low-rank  matrix  we  want  to  recover.  GRASTA,  a  robust  subspace 
tracking  algorithm,  is  designed  to  tackle  the  following  model: 

minimize  |7,n(*S')|i,  s.t.  Vn{UW  +  S)  =  Vn (M),  U  e  G{m,r).  (2.11) 

It  alternates  between  estimating  the  subspace  IA  with  Grassmannian  and  finding  the  optimal  W,  S  with  aug¬ 
mented  Lagrangian  function.  According  to  the  numerical  experiments  in  that  paper,  it  can  efficiently  recover  a 
low-rank  matrix  from  partial  measurements,  even  if  the  partially  observed  entries  are  corrupted  by  gross  outliers. 


3  Connection  between  (2.5)  and  (2.10) 

In  this  section,  we  will  show  the  equivalence  between  our  problem  and  (2.10)  with  specially  defined  A(-),  i.e. 

mmimize  ||'Pq(M  —  L  —  S')||i?,  s.t.  rank(L)  <  r,  ||S||o  <  K.  (3.1) 

We  can  change  ||  •  ||f  in  the  above  problem  to  ||  •  |||.  while  still  getting  the  same  solution.  Basically,  for  (3.1) 
we  are  given  partial  entries  of  a  matrix  (Mij  with  (i,j)  €  Cl)  and  we  want  to  represent  this  M  by  the  sum  of  a 
low-rank  matrix  L  and  a  sparse  matrix  S. 

If  a  matrix  pair  (L,  S)  satisfies  the  constraints  of  problem  (3.1),  we  can  define 


f  1,  if  (i,j)  e  Cl,  Syj  =  0, 
1  0,  otherwise. 


Hence  for  any  (i,  j)  £  Cl,  we  have  Mij  =  Lij  +  Sij  if  Ajj  =  0,  since  we  can  simply  set  Sij  =  M.;j  —  Lij  for 
fixed  L  without  violating  the  constraint  on  S.  If  A ij  =  1,  we  have  Sij  =  0,  thus  Mij  —  {Lij  +  Sij)  =  Mij  —  Lij. 
Therefore,  (3.1)  is  equivalent  to 


minimize 

l,s,  A 


Lij)2  s.t.  rank(L)  <  r,  ||S||o  <  K,  A  satisfies  (3.2). 


(3.3) 


From  the  relation  of  A  and  S  in  (3.2)  and  the  constraint  ||5||o  <  K,  we  know  the  constraints  for  S  and  A  in  the 
above  equation  can  be  replaced  by  the  constraint  on  A  in  (2.5).  On  the  other  hand,  we  know  any  matrix  L  with 
size  m  x  n  and  rank(L)  <  r  can  be  written  as  the  product  of  two  matrices  U  and  W,  where  U  €  Rmxr  and 
W  €  Rrxn.  Therefore,  (3.3)  is  the  same  as  (2.5).  And  this  instantly  leads  to  the  equivalence  of  (2.5)  and  (3.1). 


4  The  K  study 

In  practice,  the  exact  number  of  corrupted  entries  K  may  be  unknown.  If  K  is  underestimated,  some  damaged 
entries  will  still  be  used  to  solve  the  matrix  completion  problem,  which  will  induce  error  for  the  reconstructed 
matrix.  On  the  other  hand,  if  K  is  overestimated,  too  many  entries  are  removed  and  the  reconstructed  matrix 
may  not  be  unique  if  the  “new”  sampling  set  is  not  large  enough.  We  first  focus  on  the  case  when  K  is 
overestimated. 

When  I\  is  overestimated,  we  can  always  find  more  than  one  solution  of  problem  (2.5)  with  the  objective 
function  value  being  zero.  If  K  is  overestimated  by  a  small  number,  the  product  of  U  and  W  will  be  the  same 
for  all  the  solutions  and  we  are  able  to  recover  the  original  low-rank  matrix.  When  the  difference  is  greater 
than  a  certain  number,  UW  may  not  be  unique.  The  following  theorem  provides  a  sufficient  condition  for  the 
non-uniqueness  of  the  matrix  UW. 
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Theorem  1.  Suppose  we  are  given  p  entries  of  an  m  x  n  matrix  M ,  where  the  locations  of  these  data  are  chosen 
randomly.  We  know  in  advance  that  K  of  the  given  entries  are  corrupted.  Define  the  difference  between  our 
overestimated  K  value  and  the  real  K  value  as  A K.  If  A K  satisfies  A K  >  (p  —  AT)/ max(m,?z)  —  r  >  0,  then 
the  reconstructed  matrix  will  be  non-unique. 

The  above  theorem  provides  a  rough  bound  on  AT  estimate  in  order  to  guarantee  uniqueness  of  the  problem. 
In  practice,  what  we  care  about  is  how  much  we  can  overestimate  K  without  sacrificing  the  accuracy  of  our 
algorithm.  Since  K  is  closely  related  with  the  location  of  the  known  data,  we  want  to  understand  how  the  given 
entries  are  distributed  over  the  matrix.  In  the  following  theorem,  we  assume  that  whether  the  value  at  each 
entry  is  given  or  not  is  independent  and  identically  distributed.  Let  us  define  k \  as  the  number  of  given  entries 
in  the  ith  row  and  kj  as  the  number  of  given  entries  in  the  jth  column.  fcmin  denotes  the  minimum  of  all  the  k \ 
and  kj.  Through  extensive  numerical  tests,  we  notice  that  the  distribution  of  km ;n  is  similar  to  the  conditional 
distribution  of  fcmin  given  the  total  number  of  known  entries.  For  simplicity  we  use  the  former  one  to  replace  the 
conditional  distribution  and  arrive  at  the  following  theorem. 

Theorem  2.  Suppose  that  the  probability  of  each  entry  being  given  is  q  =  (p  —  K ) / (ran),  and  whether  the  entry 
is  known  or  not  is  independent  of  other  entries.  For  any  small  number  Pq  G  (0, 1),  let  us  define 

Ki  =  min (nq  -  log(l  -  (1  -  P0/2)1/m),  mq  -  log(l  -  (1  -  P0/2)1/n))  (4.1) 

K2  =  min(ng  —  —  2nq  log(l  —  (1  —  P0/ 2)1/™),  mq  —  yJ—2mq\og(l  —  (1  —  Po/2)1/n)).  (4.2) 

Then  with  at  most  Pq  probability  there  exists  one  row  or  column  in  this  matrix  with  at  most  max(/v i ,  K2 )  given 
entries. 

The  proof  of  the  above  two  theorems  can  be  found  in  the  appendix.  As  we  know,  the  uniqueness  of  MC 
problem  with  outliers  depends  on  a  lot  of  subtle  factors.  Here  we  want  to  derive  an  empirical  upper  bound  for 
A K  such  that  when  A K  is  bounded  by  this  value,  with  high  probability  our  algorithm  can  recover  the  exact 
matrix.  Considering  the  revised  sampling  set  (the  set  with  (p  —  K)  entries),  in  order  to  study  the  relationship 
between  this  upper  bound  and  the  number  of  given  entries  in  each  row  and  column,  we  design  the  following 
experiment.  We  first  fix  the  matrix  size  512  x  512.  For  each  rank  r,  the  sample  ratio  sr,  defined  as  p/(mn),  is 
chosen  as  the  smallest  number  which  could  guarantee  the  exact  matrix  recovery  when  the  real  AT  value  is  used 
as  input.  For  more  details  about  the  sr  value,  we  refer  the  readers  to  the  phase  transition  charts  in  Section  5. 
Labeling  the  minimum  of  all  the  and  kj  from  the  revised  sampling  set  as  A;min,  we  randomly  choose  10  positive 
integers  bounded  above  by  (fcmin  —  r)  and  treat  them  as  AK.  For  each  input  (K  +  AK),  the  error  of  the  recovered 
matrix  Mr  is  calculated  with  the  following  expression: 

Err  =  max  |Mj  j  —  (Mr)ij\.  (4-3) 

i,j 

If  the  error  is  less  than  10-4,  we  say  the  recovery  is  successful.  Otherwise,  we  label  it  as  a  failure.  The  results 
displayed  in  Table  1  come  from  the  average  of  20  different  tests  for  each  setting. 

Through  massive  experiments,  we  can  see  that  if  AA'  is  smaller  than  (fcmin  —  r),  our  algorithm  can  find  the 
correct  matrix  with  extremely  high  probability.  Hence  we  come  up  with  the  following  conclusion:  for  any  small 
number  Pq ,  we  can  find  two  values  Ad  and  K-2  according  to  Theorem  2  such  that  with  at  most  Pq  probability  there 
exists  one  row  or  column  with  at  most  max(A'i,  K2)  given  entries.  Then,  if  AK  is  less  than  (max(A'i,  K2)  —  r), 
with  high  probability  our  algorithm  will  return  the  exact  matrix  with  this  overestimated  K  input. 

In  application,  when  AT  is  overestimated,  according  to  the  above  conclusion  we  just  need  to  construct  a 
strategy  to  update  it  such  that  AA'  can  be  bounded  by  (fcmin  —  r).  Let  us  define  AT  as  the  estimated  K  value. 
One  intuitive  idea  is  to  check  the  value  of  the  Kth  largest  term  in  set  Sq  =  {(( UW)ij  —  Mij)2,  ( i,j )  G  fl}  in 
each  iteration.  If  this  value  is  less  than  the  tolerance  Ktoi ,  it  is  possible  that  some  of  the  deleted  data  are  not 
outliers,  and  we  can  update  K  to  be  the  number  of  terms  in  this  set  which  are  greater  than  Ktoi.  When  our 
algorithm  reaches  a  certain  stage,  we  can  calculate  the  minimum  number  of  entries  in  one  row  or  column  from 
the  sampling  set  with  (p—  AT)  elements  (label  as  fcm;n).  If  it  is  less  than  r,  we  update  K  =  K  +  km jn  —  r.  Here  we 
just  choose  the  smallest  decrease  in  K.  In  fact,  we  may  pick  a  larger  decrease  in  order  to  reduce  the  number  of 
outer  iterations.  On  the  other  hand,  when  K  is  underestimated,  we  need  to  increase  its  value  in  order  to  remove 
all  the  outliers.  As  we  know,  when  there  are  outliers  in  the  given  data,  it  is  quite  possible  that  we  are  not  able  to 
find  a  low-rank  matrix  with  entries  equalling  the  given  data  at  these  locations.  Based  on  the  fast  convergence^of 
our  algorithm,  if  at  certain  iteration  the  difference  between  the  entries  of  the  recovered  matrix  at  those  (p  —  K ) 
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rank  &  sample  rate 

avg  K/p 

avg  fcmin 

success  percentage 

rank=5,  sr=0.15 

0.01 

50.75 

100% 

0.03 

49.15 

100% 

0.05 

49.00 

100% 

0.07 

47.05 

100% 

0.09 

45.35 

100% 

rank=10,  sr=0.20 

0.01 

72.25 

100% 

0.03 

70.70 

100% 

0.05 

68.40 

100% 

0.07 

67.65 

100% 

0.09 

66.30 

100% 

rank=15,  sr=0.25 

0.01 

95.25 

100% 

0.03 

93.20 

100% 

0.05 

92.75 

100% 

0.07 

89.25 

100% 

0.09 

85.40 

100% 

rank=20,  sr=0.30 

0.01 

119.55 

100% 

0.03 

116.70 

100% 

0.05 

114.50 

100% 

0.07 

111.30 

100% 

0.09 

107.25 

100% 

rank=25,  sr=0.35 

0.01 

143.30 

100% 

0.03 

139.20 

100% 

0.05 

136.15 

100% 

0.07 

134.10 

100% 

0.09 

129.30 

100% 

Table  1:  The  K  study.  For  each  matrix  10  different  K  inputs  are  chosen  and  20  trials  are  conducted  for  each 
matrix  setting. 

locations  and  the  associated  input  data  are  greater  than  tolerance  e,  we  can  update  K  to  be  p\K  with  pi  >  1. 
Pi  should  be  chosen  properly.  When  it  is  too  small,  we  need  a  lot  of  steps  to  make  K  larger  than  the  exact  K, 
however,  when  pi  is  too  large,  K  may  go  far  above  the  exact  K,  and  more  iterations  are  required  to  decrease  its 
value.  Therefore,  we  arrive  at  the  following  algorithm. 
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Algorithm  2  RTRMC-AOP  with  K  update 

Input:  U,  InnerMiter,  OuterMiter  >  0,  r  >  0,  K  >  0,  C,  A,  Ktoi ,  pi,  e  . 

Initialization:  k,  l  =  0,  Ajj  =  1  for  (i,j)  £  and  0  otherwise,  D  =  fh 
while  l  <  OuterMiter  do 
while  k  <  InnerMiter  do 

Replace  fl  in  (2.1)  with  0  and  update  Uk  and  Wk  with  RTRMC. 

Find  the  I\ih  largest  term  in  set  Sq. 

If  it  is  less  than  Ktoi ,  update  K  to  be  the  number  of  elements  in  Sq  that  are  greater  than  Kto[. 
Update  Ak  with  (2.8). 

Update  fl  to  be  the  indices  in  U  where  Af  •  =  1. 

If  this  new  f 2  is  the  same  as  the  old  fi,  break. 
k  =  k  +  1. 

end  while^ 

Calculate  fcmjn  with  the  updated  K  value. 

If  kjcnin  ^  ^  5  K  =  A  +  &min  • 

If  kmin  >  r  and  the  function  value  of  (2.6)  is  less  than  e,  break. 

If  fcmin  >  r  and  the  function  value  is  greater  than  e,  K  =  p\K. 
k  =  0,  1  =  1  +  1. 

end  while 
return  UkWk ,  K. 


5  Numerical  results 

In  this  section  we  use  some  numerical  experiments  to  demonstrate  the  effectiveness  of  our  algorithm  (AOP  for 
short).  AOP,  together  with  SpaRCS  and  GRASTA  are  studied  and  compared. 

In  each  experiment,  the  original  matrix  M  is  generated  by  the  product  of  U  £  Rmxr  and  W  £  Rrx",  whose 
entries  follow  independent  and  identically  distributed  (i.i.d.)  Gaussian  distribution  with  variance  1.  We  denote 
the  largest  and  smallest  value  of  M  as  niL  and  ms-  p  entries  are  chosen  randomly  from  M  and  their  locations 
are  recorded  in  U.  We  then  pick  K  locations  randomly  from  fl  and  replace  the  values  at  these  locations  by  a 
random  number  from  The  corrupted  p  entries  and  U  are  used  as  input  in  our  code. 

We  first  use  phase  transition  charts  to  demonstrate  the  empirical  performance  of  our  Algorithm  1.  The  size  of 
the  matrix  is  chosen  to  be  m  =  n  =  512.  Different  values  of  r,  p  and  K  are  considered.  For  each  small  rectangle 
in  the  following  figure,  we  fix  the  value  of  r,  p  and  K,  and  applied  AOP  to  recover  the  low-rank  matrix.  If  the 
relative  error,  i.e.  \\Mr  —  M||i7/||M||p,  is  smaller  than  1CU3,  we  denote  the  test  as  “successful”.  20  different 
tests  are  conducted  for  each  setting  and  the  probability  of  successful  recovery  are  recorded.  Here  red  indicates 
recovery  success  and  blue  indicates  failure.  As  expected,  the  performance  gets  worse  when  we  decrease  p  or 
increase  r  and  K.  Similar  experiment  was  also  conducted  with  SpaRCS  in  [23]  (Figure  1),  and  by  comparing 
these  plots  we  can  see  clearly  that  under  the  same  condition,  it  is  much  easier  for  our  method  to  recover  the 
exact  matrix  than  SpaRCS. 

In  the  following  two  tests,  the  results  of  SpaRCS  and  GRASTA  are  also  shown  in  the  figures.  Let  us  assume 
the  K  value  is  known  in  advance,  i.e.  Algorithm  1  is  used  in  the  comparison.  We  will  compare  these  three  items 
(since  GRASTA  solves  a  different  problem,  only  the  relative  error  is  compared): 

1)  distance  between  P^(Mr)  and  P^(M),  i.e.  \\P^(Mr)  —  P^(M)\\f', 

2)  relative  error; 

3)  the  probability  of  correct  detections  of  corrupted  data  in  the  noisy  measurements. 

In  our  second  experiment,  we  set  m  =  n  =  500,  r  =  10,  and  examine  the  performance  of  these  algorithms  on 
data  with  different  noise  levels.  Here  the  noise  level  is  defined  as  K/p.  21  different  noise  levels  are  chosen  from 
0  to  0.1,  and  p  is  calculated  by  6r(m  +  n  —  r).  20  trials  are  performed  for  each  noise  level  and  the  mean  of  the 
above  three  items  are  recorded  in  Figure  2.  The  plots  demonstrate  that  in  these  comparisons  AOP  outperforms 
the  other  two  greatly  for  all  noise  levels.  According  to  the  relative  error  plot,  the  result  from  our  method  is 
always  around  10_1°  while  the  result  gained  by  the  other  two  algorithms  is  bounded  below  by  10-4.  Compared 
with  SpaRCS,  GRASTRA  is  slightly  more  stable  when  the  given  data  is  ruined  by  gross  outliers.  In  plot  (c), 
we  record  the  probabilities  of  correct  error  locations  detection.  From  the  graph  we  can  see  that  AOP  algorithms 
can  find  all  the  positions  of  corrupted  data  with  probability  1,  while  in  comparison  the  performance  of  SpaRCS 
detection  is  not  very  satisfying. 
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r=  5  r  =  10  r  =  15 


p/(M*N)  p/(M‘N)  p/(M'N) 


(a)  r=5 


(b)  r=10 


(c)  r=15 


r  =  20  r  =  25 


p/(M‘N)  p/(M*N) 

(d)  r=20  (e)  r=25 

Figure  1:  Phase  transitions  for  a  recovery  problem  of  size  512  x  512.  Aggregate  results  are  shown  over  20 
Monte-Carlo  runs  at  each  specification  of  r,  K  and  p.  Red  indicates  recovery  success  and  blue  indicates  failure. 


Figure  2:  Algorithm  comparison  on  corrupted  data  with  different  noise  levels,  (a)  Distance  between  P^(Mr)  and 
Pq(M)  vs  noise  level,  (b)  Relative  error  vs  noise  level,  (c)  Error  location  detection  vs  noise  level.  AOP  proves 
to  be  more  robust  to  contaminated  data  compared  with  others. 


Figure  3:  Algorithm  comparison  on  corrupted  data  with  different  noise  levels,  (a)  Distance  between  P^(Mr) 
and  Pq(M)  vs  rank,  (b)  Relative  error  vs  rank,  (c)  Error  location  detection  vs  rank.  AOP  yields  a  remarkable 
improvement  in  reducing  the  relative  error  and  finding  the  correct  error  locations  compared  with  others. 


Next  we  fix  m  =  n  =  500  and  the  noise  level  5%  and  change  r  between  2  and  40.  p  is  still  calculated  by 
6 r(m  +  n  —  r).  The  following  results  in  Figure  3  come  from  the  average  of  20  tests.  We  can  see  that  the  distance 
between  P^(Mr)  and  P^(M)  and  the  relative  error  tend  to  decrease  as  the  rank  of  M  increases.  Although  all 
the  algorithms  show  the  same  trend,  AOP  series  always  return  a  much  better  result  with  relative  error  staying 
around  10~n,  and  it  detects  the  true  error  locations  with  probability  1  in  all  the  tests.  The  relative  error  from 
GRASTA  is  bounded  below  by  10-5  and  when  the  rank  is  extremely  low,  the  relative  error  could  be  as  high  as 
10"2. 

In  the  previous  experiments,  we  assume  the  exact  K  value  is  always  given.  Now  let  us  study  the  case  when 
the  input  K  is  just  an  estimate  of  the  actual  number  of  errors.  This  time  we  fix  m  =  n  =  512,  and  examine  the 
performance  of  Algorithm  2  under  different  settings.  The  relative  error,  Err  defined  by  (4.3)  and  the  updated  K 
value  will  be  displayed  here.  In  Figure  (a)-(c),  we  set  r  =  10  and  pick  5  different  noise  levels  between  0.01  and 
0.09.  For  each  setting,  we  calculate  fcmin,  and  choose  21  different  AK  between  — 5fcmin  and  15fcmin.  Then  each 
( I\  +  AK)  is  used  as  the  input  K  value.  In  Figure  (d)-(f),  we  fix  the  noise  level  to  be  0.05  and  vary  r  from  5 
to  25.  Still,  21  different  AK  values  are  selected.  All  the  p  values  in  these  tests  are  chosen  the  same  as  Table  1. 
The  following  results  come  from  the  average  of  20  different  trials.  We  can  see  that  in  all  the  cases  our  AOP 
with  K  update  algorithm  can  detect  the  correct  number  of  outliers  with  high  probability  even  when  the  input 
K  differs  a  lot  from  its  real  value.  The  relative  error  plot  and  Err  plot  suggest  that  this  method  always  recovers 
the  matrices  with  extremely  high  accuracy. 


(b)  Err  plot 


(c)  The  updated  K 


5000 

4500' 

^  4000,, 

*  3500 

S  3000 
> 

2500- 
2000  - 
1500- 


-  r  =  IT 
>-r  -  21 
-r  =  2; 


(d)  Relative  error 


(e)  Err  plot 


AK/kmia 

(f)  The  updated  K 


Figure  4:  The  K  study.  For  (a)-(c)  we  fix  the  rank  and  vary  the  noise  level,  (a)  relative  error  vs  different  K 
input,  (b)  Err  vs  different  K  input,  (c)  K  output  vs  different  K  input.  For  (d)-(f)  we  fix  the  noise  level  and 
vary  the  rank  of  the  matrix,  (d)  relative  error  vs  different  K  input,  (e)  Err  vs  different  K  input,  (f)  K  output 
vs  different  K  input. 


6  Conclusion 

In  this  paper,  we  propose  a  method  for  exact  low-rank  matrix  completion  from  sparsely  corrupted  data  via 
adaptive  outlier  pursuit.  By  iteratively  detecting  the  damaged  measurements  and  recovering  the  matrix  from 
“correct”  measurements,  this  method  can  obtain  better  results  in  both  finding  the  noisy  measurements  and 
recovering  the  exact  matrix  when  random-valued  noise  is  introduced  in  the  measurements.  Our  algorithm  is 
implemented  and  compared  with  SpaRCS  and  GRASTA  in  the  numerical  experiments.  It  has  better  performance 
in  many  aspects  compared  to  the  other  two,  especially  in  detecting  all  the  outlier  locations.  When  the  exact 
value  of  the  number  of  outliers  is  not  provided,  the  AOP  with  K  update  algorithm  can  always  detect  the  correct 
number  of  outliers  and  recover  the  exact  matrix  in  all  the  cases  with  high  probability. 
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8  Appendix 

This  appendix  provides  the  mathematical  proofs  of  the  theoretical  results  in  Section  4. 

8.1  Proof  of  Theorem  1 

Proof.  When  we  overestimate  A ,  the  A  outliers  will  be  found  to  make  the  objective  function  0.  In  the  mean 
time,  some  non-outliers  (the  overestimated  AA'  entries)  are  also  considered  as  outliers  and  will  not  be  used  for 
matrix  completion.  Therefore,  we  only  need  to  consider  the  ( p  —  A )  correct  entries.  As  we  know,  if  the  number 
of  given  entries  in  one  row  (or  column)  is  less  than  r,  the  reconstructed  matrix  is  not  unique.  Since  AA'  of 
the  ( p  —  A)  known  entries  will  not  be  used  in  reconstructing  the  matrix,  when  AA  is  large  enough  to  make 
the  number  of  known  entries  in  one  row  (or  column)  less  than  r,  we  will  have  more  than  one  solution.  It  is 
easily  seen  that  the  smallest  number  of  known  entries  in  one  row  (or  column)  of  the  matrix  is  less  than  or  equal 
l(p-K)/m\  (or  \_{p  —  K)/n\),  here  is  the  largest  integer  that  does  not  exceed  x.  Without  loss  of  generality, 
let  us  assume  column  j  has  the  smallest  number  of  known  entries.  To  make  the  number  of  known  entries  in  this 
column  no  less  than  r,  the  smallest  number  of  entries  to  be  deleted  should  not  exceed  [(p  —  K)  jn\  —  r  +  1.  Thus 
if  A  A  is  greater  than  [(p  —  K)/n\  —  r,  the  reconstructed  matrix  will  not  be  unique.  Q 

8.2  Proof  of  Theorem  2 

Proof.  Since  the  probability  that  a  certain  location  is  chosen  is  fixed  and  equals  q  =  (jp—  K)/(mn).  In  addition, 
whether  one  entry  is  chosen  or  not  is  independent  of  other  entries.  The  number  of  known  entries  in  each  row  (or 
column)  of  this  matrix  follows  binomial  distribution.  For  each  row,  the  cumulative  distribution  can  be  expressed 
as 


F(x,  n,  q)  =  P(X<x)=Y^(n.\  q\  1  -  q)n~\  (8.1) 

i= o  '  ' 

where  X  is  the  number  of  known  entries  in  this  row. 

From  the  Hoeffding’s  inequality,  we  have 

F(k,  n,  q)  <  exp  ^2(ng^fc)2^  ,  (8.2) 

for  any  integer  k  <  nq.  Since  the  distribution  of  the  number  of  known  entries  in  each  row  is  independent,  we 
can  find  the  upper  bound  for  the  probability  that  there  exists  one  row  with  at  most  k  given  entries: 

P(minW,*2V -,«*)<*)<  l-(l-exp(-2AA>:))  :=P,.  (8.3) 

Here  k l  stands  for  the  number  of  given  entries  in  the  ith  row.  Similarly  the  upper  bound  for  the  probability  that 
there  exists  one  column  with  at  most  k  given  entries  can  be  expressed  as  follows: 

•  •  •  ,  *S)  <  «  <  1  -  (l  -  exp(-2b?lA£))  :=P,  (8.4) 


where  is  defined  as  the  number  of  given  entries  in  the  jth 


column.  Combing  these  two  together,  we  have 


P(min(fc[,  •  • 


r 

rm 


kt,kc2 


,  K)  <  k)  <  Pi  +  P2  <  2  max(Pi,  P2), 


(8.5) 


which  means  the  probability  that  there  exists  one  row  or  column  with  at  most  k  given  entries  can  be  bounded 
by  2max(Pi,P2). 
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Let  us  first  assume  P\  >  P2.  Defining 


P0  =  2  1  -  1  -  exp  -2 


( nq-kf 


we  then  have 


exp  ^—2 
k  =  nq  — 


(■ nq  —  ky 


=  1  —  (  1  —  — 


1/n 


y-^log(l-(l-P0/2)Vm). 


When  Pi  <  P2,  we  have 


k  =  mq  -  y log(l  -  (1  -  P0/2)1/n). 


(8.6) 

(8.7) 

(8.8) 

(8.9) 


We  define  Ad  as  the  minimal  of  these  two  values.  Hence  given  Pq,  the  probability  of  having  one  row  or  column 
with  at  most  Ad  entries  is  less  than  P0. 

Besides  the  Hoeffding’s  inequality,  we  also  have  Chernoff’s  inequality, 


F(k,  n,  q )  <  exp  — 


1  (nq  —  k)^ 


2  q  n 


In  this  case 


Pi  =  1  —  ^1  —  exp 
Pi  =  1-  (  1-exp  ( - 


1  (nq — ky 

2  p  n 

1  (mq  —  k )2 
2p  m 


After  similar  calculation,  we  have 


for  Pi  >  P2,  and 


k  =  nq  —  \J  —2nq\og(l  —  (1  —  Po/2)1/m) 
k  =  mq  —  \J — 2mglog(l  —  (1  —  Po/2)1/") 


(8.10) 

(8.11) 

(8.12) 

(8.13) 

(8.14) 


when  Pi  <  P2.  Similarly  we  define  AT2  to  be  the  smaller  one  of  these  two  values,  and  the  probability  of  having 
one  row  or  column  with  at  most  K2  entries  is  less  than  Pq.  Combining  the  results  from  two  inequalities  together, 
we  know  that  with  at  most  P0  probability  there  exists  one  row  or  column  with  at  most  max  (Ad ,  A’2)  given 
entries.  □ 
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